45 research outputs found
Learning from Millions of 3D Scans for Large-scale 3D Face Recognition
Deep networks trained on millions of facial images are believed to be closely
approaching human-level performance in face recognition. However, open world
face recognition still remains a challenge. Although, 3D face recognition has
an inherent edge over its 2D counterpart, it has not benefited from the recent
developments in deep learning due to the unavailability of large training as
well as large test datasets. Recognition accuracies have already saturated on
existing 3D face datasets due to their small gallery sizes. Unlike 2D
photographs, 3D facial scans cannot be sourced from the web causing a
bottleneck in the development of deep 3D face recognition networks and
datasets. In this backdrop, we propose a method for generating a large corpus
of labeled 3D face identities and their multiple instances for training and a
protocol for merging the most challenging existing 3D datasets for testing. We
also propose the first deep CNN model designed specifically for 3D face
recognition and trained on 3.1 Million 3D facial scans of 100K identities. Our
test dataset comprises 1,853 identities with a single 3D scan in the gallery
and another 31K scans as probes, which is several orders of magnitude larger
than existing ones. Without fine tuning on this dataset, our network already
outperforms state of the art face recognition by over 10%. We fine tune our
network on the gallery set to perform end-to-end large scale 3D face
recognition which further improves accuracy. Finally, we show the efficacy of
our method for the open world face recognition problem.Comment: 11 page
Dense 3D Face Correspondence
We present an algorithm that automatically establishes dense correspondences
between a large number of 3D faces. Starting from automatically detected sparse
correspondences on the outer boundary of 3D faces, the algorithm triangulates
existing correspondences and expands them iteratively by matching points of
distinctive surface curvature along the triangle edges. After exhausting
keypoint matches, further correspondences are established by generating evenly
distributed points within triangles by evolving level set geodesic curves from
the centroids of large triangles. A deformable model (K3DM) is constructed from
the dense corresponded faces and an algorithm is proposed for morphing the K3DM
to fit unseen faces. This algorithm iterates between rigid alignment of an
unseen face followed by regularized morphing of the deformable model. We have
extensively evaluated the proposed algorithms on synthetic data and real 3D
faces from the FRGCv2, Bosphorus, BU3DFE and UND Ear databases using
quantitative and qualitative benchmarks. Our algorithm achieved dense
correspondences with a mean localisation error of 1.28mm on synthetic faces and
detected anthropometric landmarks on unseen real faces from the FRGCv2
database with 3mm precision. Furthermore, our deformable model fitting
algorithm achieved 98.5% face recognition accuracy on the FRGCv2 and 98.6% on
Bosphorus database. Our dense model is also able to generalize to unseen
datasets.Comment: 24 Pages, 12 Figures, 6 Tables and 3 Algorithm
Structural similarity loss for learning to fuse multi-focus images
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. Convolutional neural networks have recently been used for multi-focus image fusion. However, some existing methods have resorted to adding Gaussian blur to focused images, to simulate defocus, thereby generating data (with ground-truth) for supervised learning. Moreover, they classify pixels as ‘focused’ or ‘defocused’, and use the classified results to construct the fusion weight maps. This then necessitates a series of post-processing steps. In this paper, we present an end-to-end learning approach for directly predicting the fully focused output image from multi-focus input image pairs. The suggested approach uses a CNN architecture trained to perform fusion, without the need for ground truth fused images. The CNN exploits the image structural similarity (SSIM) to calculate the loss, a metric that is widely accepted for fused image quality evaluation. What is more, we also use the standard deviation of a local window of the image to automatically estimate the importance of the source images in the final fused image when designing the loss function. Our network can accept images of variable sizes and hence, we are able to utilize real benchmark datasets, instead of simulated ones, to train our network. The model is a feed-forward, fully convolutional neural network that can process images of variable sizes during test time. Extensive evaluation on benchmark datasets show that our method outperforms, or is comparable with, existing state-of-the-art techniques on both objective and subjective benchmarks
Unsupervised Learning for Robust Fitting:A Reinforcement Learning Approach
Robust model fitting is a core algorithm in a large number of computer vision
applications. Solving this problem efficiently for datasets highly contaminated
with outliers is, however, still challenging due to the underlying
computational complexity. Recent literature has focused on learning-based
algorithms. However, most approaches are supervised which require a large
amount of labelled training data. In this paper, we introduce a novel
unsupervised learning framework that learns to directly solve robust model
fitting. Unlike other methods, our work is agnostic to the underlying input
features, and can be easily generalized to a wide variety of LP-type problems
with quasi-convex residuals. We empirically show that our method outperforms
existing unsupervised learning approaches, and achieves competitive results
compared to traditional methods on several important computer vision problems.Comment: The preprint of paper accepted to CVPR 202
Self-supervised learning to detect key frames in videos
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. Detecting key frames in videos is a common problem in many applications such as video classification, action recognition and video summarization. These tasks can be performed more efficiently using only a handful of key frames rather than the full video. Existing key frame detection approaches are mostly designed for supervised learning and require manual labelling of key frames in a large corpus of training data to train the models. Labelling requires human annotators from different backgrounds to annotate key frames in videos which is not only expensive and time consuming but also prone to subjective errors and inconsistencies between the labelers. To overcome these problems, we propose an automatic self-supervised method for detecting key frames in a video. Our method comprises a two-stream ConvNet and a novel automatic annotation architecture able to reliably annotate key frames in a video for self-supervised learning of the ConvNet. The proposed ConvNet learns deep appearance and motion features to detect frames that are unique. The trained network is then able to detect key frames in test videos. Extensive experiments on UCF101 human action and video summarization VSUMM datasets demonstrates the effectiveness of our proposed method
A broad autism phenotype expressed in facial morphology
Autism spectrum disorder is a heritable neurodevelopmental condition diagnosed based on social and communication differences. There is strong evidence that cognitive and behavioural changes associated with clinical autism aggregate with biological relatives but in milder form, commonly referred to as the ‘broad autism phenotype’. The present study builds on our previous findings of increased facial masculinity in autistic children (Sci. Rep., 7:9348, 2017) by examining whether facial masculinity represents as a broad autism phenotype in 55 non-autistic siblings (25 girls) of autistic children. Using 3D facial photogrammetry and age-matched control groups of children without a family history of ASD, we found that facial features of male siblings were more masculine than those of male controls (n = 69; p \u3c 0.001, d = 0.81 [0.36, 1.26]). Facial features of female siblings were also more masculine than the features of female controls (n = 60; p = 0.005, d = 0.63 [0.16, 1.10]). Overall, we demonstrated for males and females that facial masculinity in non-autistic siblings is increased compared to same-sex comparison groups. These data provide the first evidence for a broad autism phenotype expressed in a physical characteristic, which has wider implications for our understanding of the interplay between physical and cognitive development in humans
SCOL: Supervised Contrastive Ordinal Loss for Abdominal Aortic Calcification Scoring on Vertebral Fracture Assessment Scans
Abdominal Aortic Calcification (AAC) is a known marker of asymptomatic
Atherosclerotic Cardiovascular Diseases (ASCVDs). AAC can be observed on
Vertebral Fracture Assessment (VFA) scans acquired using Dual-Energy X-ray
Absorptiometry (DXA) machines. Thus, the automatic quantification of AAC on VFA
DXA scans may be used to screen for CVD risks, allowing early interventions. In
this research, we formulate the quantification of AAC as an ordinal regression
problem. We propose a novel Supervised Contrastive Ordinal Loss (SCOL) by
incorporating a label-dependent distance metric with existing supervised
contrastive loss to leverage the ordinal information inherent in discrete AAC
regression labels. We develop a Dual-encoder Contrastive Ordinal Learning
(DCOL) framework that learns the contrastive ordinal representation at global
and local levels to improve the feature separability and class diversity in
latent space among the AAC-24 genera. We evaluate the performance of the
proposed framework using two clinical VFA DXA scan datasets and compare our
work with state-of-the-art methods. Furthermore, for predicted AAC scores, we
provide a clinical analysis to predict the future risk of a Major Acute
Cardiovascular Event (MACE). Our results demonstrate that this learning
enhances inter-class separability and strengthens intra-class consistency,
which results in predicting the high-risk AAC classes with high sensitivity and
high accuracy.Comment: Accepted in conference MICCAI 202